FIX: Set fs.automatic.close to false in Hadoop configuration #614
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
The Hadoop DFS client auto closes and created clients, when the JVM shuts down. This can happen prior to the connector being closed, which requires these clients to be in open state for deletion of temporary files on close. Note that this error only comes into the picture when the connect worker is stopped/restarted.
Solution
Hadoop has a configuration
fs.automatic.close
which is by default true. Setting this asfalse
will disable this behaviour. This is safe to do as we are anyways not relying on the auto close feature for closing all the clients. The connector lifecycle has appropriate routines for closing these clients on exit since it needs to rely on the lifecycle during a a connector delete, where the JVM shut down hooks are not executed. The connector should always operate withfs.automatic.close
as false.Does this solution apply anywhere else?
If yes, where?
Test Strategy
Testing done:
Release Plan